Download data from public repositories¶

Analyzing the spatial patterns of microbial traits also poses a significant challenge due to the limited availability of environmental data that correspond to microbiome data, and this dearth of paired datasets is particularly prevalent in the majority of currently accessible public repositories. The geodata R package provides some method to download spatial, climate and soil properties from public repositories, and the MODIS host many remote-sense images of plant metrics and land cover. To a certain extent, this mitigates the difficult of limited environmental data, though the accuracy of their absolute values is relatively lower compared to those measured ones.

Yet, it is extremely challenging for most R language beginners to effectively integrate these datasets for microbial biogeography analysis, as the whole process involves manipulating diverse datasets. To solve this problem, we designed a series of functions to download various spatial, climate, vegetation, and soil parameters from public repositories. Although the accuracy of these datasets are lower than that of measured ones, they can still be associated with microbial biogeographical features at a larger spatial scale. Please note that all data would be resampled to a same spatial resolution based on the first downloaded SpatRaster!

Here we need three R packages for this section of microgeo R package tutorial. Just run the following codes to import them into R environment.

In [1]:
suppressMessages(require("magrittr")) 
require("ggplot2")  %>% suppressMessages()
require("microgeo") %>% suppressMessages()

If the Chinese characters cannot be displayed correctly, please run the following codes to set locale to UTF-8:

In [2]:
prev_locale <- Sys.setlocale("LC_CTYPE", "C.UTF-8")

We need a standard microgeo dataset for the presentations in the section of tutorial.

In [3]:
data(qtp)
map <- read_aliyun_map(adcode = c(540000, 630000, 510000)) %>% suppressMessages() 
dataset.dts.aliyun <- create_dataset(mat = qtp$asv, ant = qtp$tax, met = qtp$met, map = map,
                                     phy = qtp$tre, env = qtp$env, lon = "longitude", lat = "latitude") 
dataset.dts.aliyun %>% show_dataset()
ℹ [2024-06-08 22:17:46] INFO ==> all samples fall within the map area!

ℹ [2024-06-08 22:17:46] INFO ==> dataset has been created successfully!

ℹ [2024-06-08 22:17:46] INFO ==> use `object %>% show_dataset()` to check the summary of dataset.

── The Summary of Microgeo Dataset ─────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`

Now, let's go through each of these functions and see how to download public environmental properties from public repositories.

1. Collect aridity index¶

The get_ai() is implemented to download aridity index from global aridity index database. The resolution of original data is 30''. Here is a simple example.

In [4]:
# Download aridity index for research area 
dataset.dts.aliyun %<>% get_ai(out.dir = "test")
✔ [2024-06-08 22:17:53] SAVE ==> results have been saved to: object$spa$rast$his$AI

In [5]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ─────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ─────────────────────────────────────────
✔ object$spa: 1 historically numeric variables; 0 historically classification variables; 0 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [6]:
# Visualize the aridity index
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q1 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$AI, 
                   color = colorRampPalette(RColorBrewer::brewer.pal(11, "RdYlGn"))(100)) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q2 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$AI, breaks = c(0.03, 0.2, 0.5, 0.65),
                  color = RColorBrewer::brewer.pal(11, "RdYlGn")[c(1,3,5,9,11)], labels = c("HAR", "AR", "SER", "SHR", "HR")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q1, q2, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

2. Collect elevation¶

The get_elev() is implemented to download elevation data from WorldClim database version 2.1. You are allowed to specify a spatial resolution. Here is a simple example based on the resolution of 10'.

In [7]:
# Download elevation for research area 
dataset.dts.aliyun %<>% get_elev(res = 10, out.dir = "test")
✔ [2024-06-08 22:18:04] SAVE ==> results have been saved to: object$spa$rast$his$ELEV

In [8]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ─────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ─────────────────────────────────────────
✔ object$spa: 2 historically numeric variables; 0 historically classification variables; 0 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [9]:
# Visualize the elevation
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q3 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$ELEV) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q4 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$ELEV, breaks = c(3000, 4000, 5000, 6000), 
                  labels = c("<3000", "3000-4000", "4000-5000", "5000-6000", ">6000")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q3, q4, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

3. Collect historical bioclimatic variables¶

The get_his_bioc() is implemented to download 19 historically bioclimatic variables from WorldClim database version 2.1. You are allowed to specify a spatial resolution. Here is a simple example based on the resolution of 10'.

In [10]:
# Download 19 historically bioclimatic variables of research area
dataset.dts.aliyun %<>% get_his_bioc(res = 10, out.dir = "test")
✔ [2024-06-08 22:18:19] SAVE ==> results have been saved to: object$spa$rast$his(19 variables)

In [11]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ─────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ─────────────────────────────────────────
✔ object$spa: 21 historically numeric variables; 0 historically classification variables; 0 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [12]:
# Visualize the Bio1 (Mean annual temperature) 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q5 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$Bio1) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q6 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$Bio1, breaks = c(-1, 0, 1), labels = c("A", "B", "C", "D")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q5, q6, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image
In [13]:
# Visualize the Bio12 (Mean annual precipitation) 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q7 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$Bio12) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q8 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$Bio12, breaks = c(100, 300, 500), labels = c("<100", "100-300", "300-500", ">500")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q7, q8, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

4. Collect futrue bioclimatic variables¶

The get_fut_bioc() is implemented to download 19 future bioclimatic variables from WorldClim database version 2.1. You are allowed to specify a spatial resolution. Here is a simple example based on the resolution of 10'.

In [14]:
# Download futrue bioclimatic variables of research area
dataset.dts.aliyun %<>% get_fut_bioc(res = 10, gcm = "ACCESS-CM2", out.dir = "test")
✔ [2024-06-08 22:19:03] SAVE ==> results have been saved to: object$spa$rast$fut [19 variables; 4 groups]

In [15]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ─────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ─────────────────────────────────────────
✔ object$spa: 21 historically numeric variables; 0 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [16]:
# Visualize the Bio1 (Mean annual temperature) of `ACCESS-CM2|ssp585|2061-2080` 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q9 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$fut$`ACCESS-CM2|ssp585|2061-2080`$Bio1) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q10 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$fut$`ACCESS-CM2|ssp585|2061-2080`$Bio1, breaks = c(-1, 0, 1), labels = c("A", "B", "C", "D")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q9, q10, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image
In [17]:
# Visualize the Bio12 (Mean annual precipitation) of `ACCESS-CM2|ssp585|2061-2080` 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q11 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$fut$`ACCESS-CM2|ssp585|2061-2080`$Bio12) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q12 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
   add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$fut$`ACCESS-CM2|ssp585|2061-2080`$Bio12, breaks = c(100, 300, 500), labels = c("<100", "100-300", "300-500", ">500")) %>% 
   add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q11, q12, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

5. Collect numeric metrics from MODIS¶

The get_modis_num_metrics() is implemented to download numeric MODIS metrics from the EOSDIS. Running microgeo::show_modis_num_metrics() can display detailed infomation about these metrics. Regarding the NDVI and ENV, you can specify a spatial resolution (meter). Here is a simple example.

In [18]:
# Show all avaliable numeric MODIS metrics
show_modis_num_metrics()
A data.frame: 28 × 7
measureresolutiontypenamesdsunitscale.factor
<chr><chr><chr><chr><chr><chr><chr>
1NDVI 250 TerraMOD13Q1 250m 16 days NDVIno unit 1e-04
2NDVI 500 TerraMOD13A1 500m 16 days NDVIno unit 1e-04
3NDVI 1000TerraMOD13A2 1 km 16 days NDVIno unit 1e-04
4NDVI 250 Aqua MYD13Q1 250m 16 days NDVIno unit 1e-04
5NDVI 500 Aqua MYD13A1 500m 16 days NDVIno unit 1e-04
6NDVI 1000Aqua MYD13A2 1 km 16 days NDVIno unit 1e-04
7EVI 250 TerraMOD13Q1 250m 16 days EVI no unit 1e-04
8EVI 500 TerraMOD13A1 500m 16 days EVI no unit 1e-04
9EVI 1000TerraMOD13A2 1 km 16 days EVI no unit 1e-04
10EVI 250 Aqua MYD13Q1 250m 16 days EVI no unit 1e-04
11EVI 500 Aqua MYD13A1 500m 16 days EVI no unit 1e-04
12EVI 1000Aqua MYD13A2 1 km 16 days EVI no unit 1e-04
13GPP 500 TerraMOD17A2HGFGpp_500m kg C/m^2 1e-04
14GPP 500 Aqua MYD17A2H Gpp_500m kg C/m^2 1e-04
15PsnNet500 TerraMOD17A2HGFPsnNet_500m kg C/m^2 1e-04
16PsnNet500 Aqua MYD17A2H PsnNet_500m kg C/m^2 1e-04
17NPP 500 TerraMOD17A3HGFNpp_500m kg C/m^2 1e-04
18NPP 500 Aqua MYD17A3HGFNpp_500m kg C/m^2 1e-04
19LST 1000TerraMOD11A1 LST_Day_1km Kelvin 0.02
20LST 1000Aqua MYD11A1 LST_Day_1km Kelvin 0.02
21ET 500 TerraMOD16A2 ET_500m kg/m²/8day0.1
22ET 500 Aqua MYD16A2 ET_500m kg/m²/8day0.1
23PET 500 TerraMOD16A2 PET_500m kg/m²/8day0.1
24PET 500 Aqua MYD16A2 PET_500m kg/m²/8day0.1
25LE 500 TerraMOD16A2 LE_500m J/m²/day 10000
26LE 500 Aqua MYD16A2 LE_500m J/m²/day 10000
27PLE 500 TerraMOD16A2 PLE_500m J/m²/day 10000
28PLE 500 Aqua MYD16A2 PLE_500m J/m²/day 10000
In [19]:
# Download numeric metrics from MODIS
# Please provide correct username and password. Run `?get_modis_num_metrics()` to see more details.
dataset.dts.aliyun %<>% get_modis_num_metrics(username = "username", password = "password", 
                                              date.ran = c("2019-08-01|2019-09-01", "2020-08-01|2020-09-01"),
                                              measures = c("NDVI", "EVI"), out.dir = "test", nums.job = 12)
ℹ [2024-06-08 22:19:20] INFO ==> preparing MODIS product list for searching...

ℹ [2024-06-08 22:19:20] INFO ==> searching avaliable MODIS products...

ℹ [2024-06-08 22:19:20] INFO ==> current product (1/2): MOD13A2 (NDVI|EVI--> 2019-08-01 to 2019-09-01)

ℹ [2024-06-08 22:19:22] INFO ==> current product (2/2): MOD13A2 (NDVI|EVI--> 2020-08-01 to 2020-09-01)

ℹ [2024-06-08 22:19:24] INFO ==> find 48 files with 0.97 GB in total...

ℹ [2024-06-08 22:19:24] INFO ==> downloading all avaliable MODIS products[skip if the file exists]...

ℹ [2024-06-08 22:19:24] INFO ==> preparing the PTVs (Product, Time, Version) for merging remote-sensing images...

ℹ [2024-06-08 22:19:24] INFO ==> converting hdf files to tif files...

ℹ [2024-06-08 22:19:24] INFO ==> current product (1/1): MOD13A2 (convert 48 hdf files into 12 tif files using 12 threads)

ℹ [2024-06-08 22:19:25] INFO ==> calculating average values based on date range...

ℹ [2024-06-08 22:19:25] INFO ==> current measure (1/2): NDVI_061(12 threads)

ℹ [2024-06-08 22:19:25] INFO ==> current measure (2/2): EVI_061(12 threads)

ℹ [2024-06-08 22:19:25] INFO ==> reprojecting the CRS of SpatRaster to epsg:+proj=longlat +datum=WGS84 +no_defs, it takes a while...

✔ [2024-06-08 22:19:37] SAVE ==> results have been saved to: object$spa$rast$his$NDVI

ℹ [2024-06-08 22:19:37] INFO ==> reprojecting the CRS of SpatRaster to epsg:+proj=longlat +datum=WGS84 +no_defs, it takes a while...

✔ [2024-06-08 22:19:49] SAVE ==> results have been saved to: object$spa$rast$his$EVI

In [20]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ─────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ─────────────────────────────────────────
✔ object$spa: 23 historically numeric variables; 0 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [21]:
# Visualize the NDVI and EVI 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q13 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$NDVI) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q14 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$EVI) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q13, q14, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

6. Collect classification metrics from MODIS¶

The get_modis_cla_metrics() is used to download classification MODIS metrics from the EOSDIS. Running microgeo::show_modis_cla_metrics() can display detailed infomation about these metrics. Here is a simple example.

In [22]:
# Show all avaliable classification metrics
show_modis_cla_metrics()
A data.frame: 11 × 7
measureresolutiontypenamesdsunitscale.factor
<chr><chr><chr><chr><chr><chr><chr>
1LC_Type1 ModerateCombinedMCD12Q1LC_Type1 no unitNA
2LC_Type2 ModerateCombinedMCD12Q1LC_Type2 no unitNA
3LC_Type3 ModerateCombinedMCD12Q1LC_Type3 no unitNA
4LC_Type4 ModerateCombinedMCD12Q1LC_Type4 no unitNA
5LC_Type5 ModerateCombinedMCD12Q1LC_Type5 no unitNA
6LC_Prop1 ModerateCombinedMCD12Q1LC_Prop1 no unitNA
7LC_Prop2 ModerateCombinedMCD12Q1LC_Prop2 no unitNA
8LC_Prop3 ModerateCombinedMCD12Q1LC_Prop3 no unitNA
9LC_Prop1_AssessmentModerateCombinedMCD12Q1LC_Prop1_Assessmentno unitNA
10LC_Prop2_AssessmentModerateCombinedMCD12Q1LC_Prop2_Assessmentno unitNA
11LC_Prop3_AssessmentModerateCombinedMCD12Q1LC_Prop3_Assessmentno unitNA
In [23]:
# Download classification metrics from MODIS
# Please provide correct username and password. Run `?get_modis_num_metrics()` to see more details.
dataset.dts.aliyun %<>% get_modis_cla_metrics(username = "username", password = "password", 
                                              measures = "LC_Type1", out.dir = "test")
ℹ [2024-06-08 22:19:56] INFO ==> preparing MODIS product list for searching...

ℹ [2024-06-08 22:19:56] INFO ==> searching avaliable MODIS products...

ℹ [2024-06-08 22:19:56] INFO ==> current product (1/1): MCD12Q1 (LC_Type1--> 2022-01-01 to 2022-12-31)

ℹ [2024-06-08 22:19:57] INFO ==> find 8 files with 0.09 GB in total...

ℹ [2024-06-08 22:19:57] INFO ==> downloading all avaliable MODIS products[skip if the file exists]...

ℹ [2024-06-08 22:19:57] INFO ==> preparing the PTVs (Product, Time, Version) for merging remote-sensing images...

ℹ [2024-06-08 22:19:57] INFO ==> converting hdf files to tif files...

ℹ [2024-06-08 22:19:57] INFO ==> current product (1/1): MCD12Q1 (convert 8 hdf files into 1 tif files using 1 threads)

ℹ [2024-06-08 22:19:57] INFO ==> collecting all merged image files...

ℹ [2024-06-08 22:19:57] INFO ==> current measure (1/1): LC_Type1_061

ℹ [2024-06-08 22:19:57] INFO ==> reprojecting the CRS of SpatRaster to epsg:+proj=longlat +datum=WGS84 +no_defs, it takes a while...

✔ [2024-06-08 22:20:25] SAVE ==> results have been saved to: object$spa$rast$cla$LC_Type1

In [24]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ─────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ─────────────────────────────────────────
✔ object$spa: 23 historically numeric variables; 1 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [25]:
# Subset classification. Due to the high data resolution, it is not recommended to use `add_crs()` for visualization here. Otherwise, it may take a very long time.
options(repr.plot.width = 16.4, repr.plot.height = 8.02)
g.spat.raster <- subset_cla_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$cla$LC_Type1, use.class = c(10, 16)) # Only display grassland and barren
plot_bmap(map = dataset.dts.aliyun$map) %>% add_spatraster(spat.raster = g.spat.raster, color = c("darkgreen", "brown"))
No description has been provided for this image

7. Collect world soil properties¶

The get_soilgrid() is implemented to download soil metrics from SoilGRIDS. In the current version, there are 8 avaliable metrics. Here is a simple example.

In [26]:
# Download soil metrics from the SoilGRIDS for the research region
dataset.dts.aliyun %<>% get_soilgrid(depth = 5, measures = c("phh2o", "soc"), out.dir = "test")
✔ [2024-06-08 22:20:40] SAVE ==> results have been saved to: object$spa$rast$his$phh2o

✔ [2024-06-08 22:20:48] SAVE ==> results have been saved to: object$spa$rast$his$soc

In [27]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ─────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ─────────────────────────────────────────
✔ object$spa: 25 historically numeric variables; 1 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [28]:
# Visualize the phh2o and soc
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q15 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$phh2o) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q16 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$soc) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q15, q16, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

8. Collect China soil properties¶

The get_soilcn() function is used to process soil metrics downloaded from here. Due to the limitations in copyrights, the spatial data of China soil properties should be manually downloaded. Here is a simple example.

In [29]:
# Process soil metrics of CHINA for a research area
# You should download the .nc files, and then place theme into `test/soilchina_products` before run `get_soilcn()`
dataset.dts.aliyun %<>% get_soilcn(depth = 0.045, measures = c("PH", "SOM"), out.dir = "test")
✔ [2024-06-08 22:21:00] SAVE ==> results have been saved to: object$spa$rast$his$PH

✔ [2024-06-08 22:21:07] SAVE ==> results have been saved to: object$spa$rast$his$SOM

In [30]:
# Show dataset 
dataset.dts.aliyun %>% show_dataset()
── The Summary of Microgeo Dataset ─────────────────────────────────────────────
! object$mat: 6808 ASVs/genes and 1244 samples [need to be subsampled!]

ℹ object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1244 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6808 tip labels

ℹ object$env: 1244 samples and 10 variables

── The Summary of Biogeographic Traits ─────────────────────────────────────────
✔ object$spa: 27 historically numeric variables; 1 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`
In [31]:
# Visualize the PH and SOM 
options(repr.plot.width = 13.43 * 2, repr.plot.height = 7.9)
q17 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$PH) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
q18 <- plot_bmap(map = dataset.dts.aliyun$map) %>%
    add_spatraster(spat.raster = dataset.dts.aliyun$spa$rast$his$SOM) %>% 
    add_north_arrow() %>% add_scale_bar() %>% add_crs()
cowplot::plot_grid(q17, q18, align = 'hv', ncol = 2, labels = c("a", "b"), label_size = 28) %>% suppressWarnings()
No description has been provided for this image

9. Extract spatial data for each sample¶

After all spatial data have been successfully downloaded, we can use the extract_data_from_spatraster() function to extract these spatial data based on the latitude and longitude of the samples. Here is a simple example.

In [32]:
# Show spatial variable names in the dataset before the extraction 
dataset.dts.aliyun %<>% get_spa_vars()
head(dataset.dts.aliyun$spa$unit)
A data.frame: 6 × 2
measureunit
<chr><chr>
1ELEVm
2AI
3Bio1degree centigrade
4Bio2degree centigrade
5Bio3degree centigrade
6Bio4degree centigrade
In [33]:
# Extract spatial data for samples
dataset.dts.aliyun %<>% extract_data_from_spatraster() %>% suppressWarnings() # A data.frame of historically spatial variables for each sample is avaliable at `dataset.dts.aliyun$spa$tabs`
head(dataset.dts.aliyun$spa$tabs)
! [2024-06-08 22:21:14] WARN ==> Some samples were failed to be applied for extraction. use `remove.na = FALSE` to check them!

✔ [2024-06-08 22:21:14] SAVE ==> results have been saved to: object$spa$tabs

A data.frame: 6 × 28
LC_Type1AIELEVBio1Bio2Bio3Bio4Bio5Bio6Bio7⋯Bio16Bio17Bio18Bio19NDVIEVIphh2osocPHSOM
<fct><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>⋯<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
s1100.47584267.391-1.06805214.7024240.65077779.013814.79816-21.3707636.16893⋯308.059414308.0594167096.1695045.4347.00000058.199977.7817.354
s2100.47584267.391-1.06805214.7024240.65077779.013814.79816-21.3707636.16893⋯308.059414308.0594167096.1695045.4347.00000058.199977.7817.354
s3100.47584267.391-1.06805214.7024240.65077779.013814.79816-21.3707636.16893⋯308.059414308.0594167096.1695045.4347.00000058.199977.7817.354
s4100.47584267.391-1.06805214.7024240.65077779.013814.79816-21.3707636.16893⋯308.059414308.0594167096.1695045.4347.00000058.199977.7817.354
s5100.47584267.391-1.06805214.7024240.65077779.013814.79816-21.3707636.16893⋯308.059414308.0594167096.1695045.4347.00000058.199977.7817.354
s6100.48724269.422-1.07755314.7110740.66435779.054314.78921-21.3888036.17801⋯308.128114308.1281167303.1105289.1116.90000163.499957.7817.354
In [34]:
# Check sample numbers 
dim(dataset.dts.aliyun$spa$tabs)
  1. 1095
  2. 28
In [35]:
# Finally, we tidy up the dataset again 
dataset.dts.aliyun %<>% rarefy_count_table()
dataset.dts.aliyun %<>% tidy_dataset()
dataset.dts.aliyun %>% show_dataset()
ℹ [2024-06-08 22:21:16] INFO ==> the ASV/gene abundance table has been rarefied with a sub-sample depth of 5310

── The Summary of Microgeo Dataset ─────────────────────────────────────────────
ℹ object$mat: 6806 ASVs/genes and 1095 samples [subsample depth: 5310]

ℹ object$ant: 6806 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

ℹ object$met: 1095 samples and 2 variables (longitude, latitude)

ℹ object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

ℹ object$phy: a phylogenetic tree with 6806 tip labels

ℹ object$env: 1095 samples and 10 variables

── The Summary of Biogeographic Traits ─────────────────────────────────────────
✔ object$spa: 27 historically numeric variables; 1 historically classification variables; 4 groups of future climate data

• To check the summary of dataset, Replace `object` with the variable name of your dataset
• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`